Unsupervised Word Alignment Using Frequency Constraint in Posterior Regularized EM
نویسندگان
چکیده
Generative word alignment models, such as IBM Models, are restricted to oneto-many alignment, and cannot explicitly represent many-to-many relationships in a bilingual text. The problem is partially solved either by introducing heuristics or by agreement constraints such that two directional word alignments agree with each other. In this paper, we focus on the posterior regularization framework (Ganchev et al., 2010) that can force two directional word alignment models to agree with each other during training, and propose new constraints that can take into account the difference between function words and content words. Experimental results on French-to-English and Japanese-to-English alignment tasks show statistically significant gains over the previous posterior regularization baseline. We also observed gains in Japanese-toEnglish translation tasks, which prove the effectiveness of our methods under grammatically different language pairs.
منابع مشابه
Large-scale Word Alignment Using Soft Dependency Cohesion Constraints
Dependency cohesion refers to the observation that phrases dominated by disjoint dependency subtrees in the source language generally do not overlap in the target language. It has been verified to be a useful constraint for word alignment. However, previous work either treats this as a hard constraint or uses it as a feature in discriminative models, which is ineffective for large-scale tasks. ...
متن کاملUnsupervised Word Alignment by Agreement Under ITG Constraint
We propose a novel unsupervised word alignment method that uses a constraint based on Inversion Transduction Grammar (ITG) parse trees to jointly unify two directional models. Previous agreement methods are not helpful for locating alignments with long distances because they do not use any syntactic structures. In contrast, the proposed method symmetrizes alignments in consideration of their st...
متن کاملOnline EM for Unsupervised Models
The (batch) EM algorithm plays an important role in unsupervised induction, but it sometimes suffers from slow convergence. In this paper, we show that online variants (1) provide significant speedups and (2) can even find better solutions than those found by batch EM. We support these findings on four unsupervised tasks: part-of-speech tagging, document classification, word segmentation, and w...
متن کاملA Framework for Tuning Posterior Entropy in Unsupervised Learning
We present a general framework for unsupervised and semi-supervised learning containing a graded spectrum of Expectation Maximization (EM) algorithms. We call our framework Unified Expectation Maximization (UEM.) UEM allows us to tune the entropy of the inferred posterior distribution during the E-step to impact the quality of learning. Furthermore, UEM covers existing algorithms like standard ...
متن کاملFeature-Based ITG for Unsupervised Word Alignment
3 Department of Computer Science, School of Computing, National University of Singapore Abstract. Inversion transduction grammar (ITG) [1] is an effective constraint to word alignment search space. However, the traditional unsupervised ITG word alignment model is incapable of utilizing rich features. In this paper, we propose a novel feature-based unsupervised ITG word alignment model. With the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014